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T. pallidum is one of the few human bacterial pathogens that have not been cultivated in vitro. This pathogens still remains the enigmatic pathogen, since 
few of its virulence factors have been identified and the pathogenesis of the disease is poorly understood. Several experimental approaches such as 
evolutionary or mutation analysis and complementation to definitively identify virulence determinants are in infancy state. Whole genome sequencing of 
the available Treponema subspecies and the resulting comparative analysis of genome sequences approaches seems to be one a promising approach 
in the whole context. Divergence within the species is mainly caused by variation in gene and protein sequences but also by differences in the set of 
genes that is present in a particular species. Proteins that are specific for a particular species may be responsible for its adapted phenotype, e.g. its ability 
to act as a pathogen or its resistance to a certain drug. Identifying species-specific proteins is thus a relevant aim, and here we make a small contribution 
towards its achievement. In the present work, we have compared the genomes, genes and proteins of five different Treponema sub species by extracting 
numerous protein sequence properties using state-of-the-art Support Vector Machines. The genome of the Treponema pallidum sub-species were 
sequenced to study the gene properties (Comparative Genome Sequencing, CGS) about 5016 protein coding genes. The sequences were compared 
with multiplewise alignment tool. The result obtained was filtered on the basis of sequences with 100 percent and 99 percent similarity. Functional sites 
of these sequences were predicted with the help of prosite scan. When compared to the heterogeneity in the T. pallidum chromosome. To our surprise, 
we find that proteins of different species are signicantly correlated and can be distinguished based on sequence properties and functional sites encoded 
in within their genomes. This discrimination does not rely on any homology criteria but is based only on the biophysical characteristics en-coded in the 
sequence. We have also constructed a phylogenetic tree based on the results of the comparisons, and compared it to the well-documented. The observed 
gene and protein comparison is the first assessment of the degree of variation between the five T.pallidium sub-species and hence it paves the way for 
phylogenetic studies of these enigmatic organisms. Moreover the divergence in genome, genes and proteins more often belonged to the group of genes 


with predicted virulence and unknown functions suggesting their involvement in infection differences such as yaws or syphilis. 


: T.pallidium sub-species; Comparative Genomeics, Gene Wiz Browser, Prosite, Cladogram. 


GW — GeneWiz, Bp —Base Pairs, AA —Amino Acids, CGS - Comparative Genome Sequencing. 


Homology: 

The relationship among 
sequences due to descent 
from a common ancestral 
sequence. An important 
organizing principle for 
genomic studies because 
structural and functional 
similarities tend to change 
together along the 
structure of homology 
relationships. 
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Both comparative gene analysis as well as comparative 
proteins encoded in complete genomes of an organism 
revels novel and unique species specific information inspite 
of the techniques are still being in their infancy, and have yet 
to reach their full potential. A comparative genomics is a 
powerful tool which enables to understand the underlying 
mechanism of evolution, pathogenesis and adaptive 
strategies in emerging non cultural pathogens such as 
Spirochetes. T. pallidumi spp is one of the few unusual 
human pathogens that have not been cultured continuously 
in vitro. A Gram-negative spirochaete bacterium with 
subspecies cause treponemal diseases such as syphilis, 
bejel, pinta and yaws. The treponemes have a cytoplasmic 
and outer membrane. Five subspecies of Treponema 
pallidum namely Treponema pallidum subsps. pallidum 
DAL1, Treponema pallidum subsps. pallidum SS14, 
Treponema pallidum subsps. pallidum str., Treponema 
pallidum subsps. pallidum str. Nichols and Treponema 
pallidum subsps. pallidum str. CDC2 were compared on the 
basis of some genomic properties and various types of 
functional proteins. Closely related species reveal species- 
specific differences and evolutionary selection pressures on 
genes (Lukashin and Borodovsky, 1998). At the same time, 


a comparative sequence analysis provides the means for a 
better annotation. In addition to its spirochetal morphology 
and absence of lipopolysaccharide in its outer membrane 
contains relatively few intra membranous proteins; put 
forward several limitations on its research. With an objective 
to gain more insight in the functional elements of the 
genomes of the species of Treponema ,present study 
reveals the comparative relationship within the predicted 
ORFs , protein sequences with their functional sites , 
encoded in the complete genomes of Treponema pallidum 
spp. Conclusions from such Comparative analysis augment 
the understanding of the host-parasite interactions that 
enable pathogens to carve out unique ecological niches in 
nature. In the present study, we summarize the findings of 
each genome along with a computational comparative 
analysis of the five different subspecies of T. pallidum that 
can provide further insight into species and _ strain 
uniqueness and importantly can stimulate new studies 
leading to new approaches into disease prevention and 
treatment (Lowe and Eddy, 1997). 


The present comparative analysis among the Treponema 
subspecies which are pathogenic to human, aims to allocate 
the similarity and differences among the genes encoded 
genomes. The comparative study also aims to 
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SCIENTOMETRIC 


Interpreting the functional content of a given genomic sequence is one of the central challenges of biology today. Perhaps the most promising 
approach to this problem is based on the comparative method of classic biology in the modern guise of sequence comparison. For instance, protein- 
coding regions tend to be conserved between species. Hence, a simple method for distinguishing a functional exon from the chance absence of stop 
codons is to investigate its homologue from closely related species. 


Citation analysis: 


It is the examination of 
the frequency, patterns, 
and graphs 

of citations in articles a 
nd books. It uses 
citations in scholarly 
works to establish links 
to other works or other 
researchers. Citation 
analysis is one of the 
most widely used 
methods 

of bibliometrics. 


Forward genetics: 


It involves studying 
genes one at a time. 
Only a small minority of 
genes are uniquely 
associated with an 
easily definable 
phenotype - a 
characteristic that is 
critical for determining 
gene function by 
forward genetics. 


reveal the genomic properties of the species, identify the 
identical sequences of proteins and analyze their functional 
sites. 


2.1 Scope of the Study 

The aim of the study is to allocate the common and 
differentiating functional components encoded within the 
genomes of the five species of Treponema. Research 
review signifies the similarities and differences at 
morphological level as, Treponema pallidum subspecies 
are morphologically and serologically indistinguishable. The 
mode of transmission is not unique in nature. The course of 
each disease is significantly variable. The outer membrane 
of T. pallidum has too few surface proteins for an antibody to 
be effective. Thus due to poor antigencity, it’s diagnosis and 
treatment through antibodies (vaccines) is difficult. The 
molecular analysis at sequence point such as on their mode 
of pathogencity and study the clinical significance of these 
species. Comparative genome, gene and protein analysis of 
the five subspecies of their findings the similarity and 
differences may be useful for future research 


2.2 Limitations of the Study 

e Study undertaken is limited to three years 

e The genomes of the species Treponema was 
available during the timeline of this research study 

e We did the citation analysis based on the Secondary 
information available in the databases 

e In this study we did not include citiation analysis 
based on the invitro findings 


3. MATERIALS AND METHODS 
3.1. Materials 


3.1.1. Genome Sequences 

Genomes of the Treponema pallidum subsps namely 
Treponema pallidum subsps. pallidum DAL1 (species A), 
Treponema pallidum subsps. pallidum SS14 (species B), 
Treponema pallidum subsps. pallidum str. Chicago (species 
C), Treponema pallidum subsps. pallidum str. Nichols 
(species D) and Treponema pallidum subsps. pallidum str. 
CDC2 (species E) were selected for analysis and 
abbrivated as above. Genome sequences of all the 
Treponema subspecies and genome sstatistics were 
collected from the Genome sequence database maintained 
at the National center for Biotechnology Information 
(National Institutes of Health, Bethesda, Md.). This resourse 
organizes information on genomes including sequences, 
maps, chromosomes, assemblies and annotations 
(http://www.ncbi.nim.nih. gov/sites/entrez?Db=genome). 


3.2. Research Methodology 

3.2.1. Sequence analysis: detection and 
interpretation of varying levels of genome 
sequence similarity 

3.2.1.1. Clustal W Multiple Wise Alignments Program 
ClustalW2 is a general purpose multiple sequence alignment 
program for DNA or proteins. It attempts to calculate the 
best match for the selected sequences and lines them up so 
that the identities, similarities and differences can be seen 
(http:/Awww.ebi.ac.uk/Tools/msa/clustalw2/#). 


3.2.1.2. Tree View Software 

Phylogenetic trees were constructed using the CLUSTALW 
programs (Sigrist et al., 2005) with the neighbor-joining and 
least squares (Fitch-Margoliash) methods, accompanied by 
bootstrap analysis (De Castro et al., 2005). Tree View is a 
program for displaying and printing phylogenies. The 
program reads most NEXUS tree files (such as those 
produced by PAUP and COMPONENT) and PHYLIP style 


tree files (including those produced by fast DNAmI and 
CLUSTALW). 


3.2.1.3. GeneWiz browser 0.94 server 

GeneWiz browser 0.94 server is an interactive web 
application for visualizing genomic data of prokaryotic 
chromosomes. The tool allows users to carry out various 
analyses such as mapping alignments of homologous genes 
to other genomes, mapping of short sequencing reads to a 
reference chromosome and calculating DNA properties such 
as curvature or stac k-ing energy along the chromosome 
(Tamura et al., 2007). The GeneWiz browser produces an 
interactive graphic that enables zooming from a global scale 
down to single nucleotides without changing the size of the 
plot. Its ability to disproportionally zoom provides optimal 
readability and increased functionality compared to other 
browsers. It allows the user to select the display of various 
genomic features such as color setting and data ranges. 
Custom numerical data can be added to the plot allowing, 
for example, visualization of gene expression and regulation 
data. Further, standard atlases are pre-generated for all 
prokaryotic genomes available in GenBank, providing a fast 
overview of all available genomes, including recently 
deposited genome sequences. The tool is available online 
from (http://Awww.cbs.dtu.dk/services/gwBrowser). 


3.2.1.4. Microbial Genome Annotation Tools 
GLIMMER is a system for finding genes in microbial DNA, 
especially the genomes of bacteria and archaea. GLIMMER 
(Gene Locator and Interpolated Markov ModelER) uses 
interpolated Markov models to identify coding regions 
(elcher et al., 1999), (http:/;www.ncbi.nlm.nih.gov/genomes 
/MICROBES/glimmer_3.cgi?). 


3.2.2. Conservation and diversity of functional 
classes of proteins between the subspecies of 
Treponema 

Recent advances in high-throughput structural determination 
techniques and_ structural genomics initiatives have 
produced an increase in volume of structural data for 
proteins prior to knowledge of their functions. With these 
advances several tools are developed rapidly to predict 
functions for proteins based on their sequence similarity. 


3.2.2.1. Prosite 

PROSITE is a database of protein, currently contains 
patterns and profiles specific for more than a thousand 
protein families or domains. It is based on the observation 
that large number of different proteins can be grouped on 
the basis of similarities in their sequences, into a limited 
number of families. Proteins or protein domains belonging to 
a particular family generally share functional attributes and 
are derived from a common ancestor. The ProRule section 
of PROSITE is constituted of manually created rules that 
can automatically generate annotation in 
the UniProtKB/Swiss-Prot format based on PROSITE motifs. 
These rules, most of the times rules are based on PROSITE 
profiles as they are more specific than patterns, but 
occasionally rules make use of patterns. In these cases, the 
rules will not work independently, but will be called by 
another rule, which will be triggered by a profile. In addition 
to these rules corresponding to a unique PROSITE motif, 
there are also rules triggered by a specific combination of 
PROSITE motifs called metamotifs. Metamotifs allow the 
definition of arrangements of domains separated by spacers 
of variable size, as well as the anchoring to the N- and/or C- 
termini and the exclusion of a PROSITE motif (Sigrist et al., 
2010). ProRule is used to create UniProtKB/Swiss-Prot lines 
with basic and complex annotation derived from the 
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TABLE 1 GENOME PROPERTIES 


Species |Size (Mb)| GC% | Gene | Protein 
A 
3 
: 
p 


presence of the domain and of 


biologically critical amino 
acids: domain name _ and 
boundaries, EC number, 


function, keywords, associated 
PROSITE patterns, PTMs, 
active sites, disulfide bonds, 
etc.). ProRule contains notably 
the position of structurally 
and/or functionally _rritical 
amino acid(s), as well as the 
condition(s) they must fulfil to 
play their biological role(s). Part of these supplementary 
data are used by ScanProsite that not only provides the 
protein sequence matched by a profile, but also information 
about the relevance of biologically meaningful residues, like 
active sites, binding sites, post-translational modification 
sites or disulfide bonds, to help function determination. 


4. RESULTS AND DISCUSSION 


4.1. Comparative Genome Analysis 

Completely automated computational analysis of genome 
sequences of five subspecies of Treponema pallidum was 
obtained from NCBI to compare the basic properties of 
genes of these species. The size of the genome was found 
to be 1.4Mb for all species under analysis. Table 1 clearly 
indicated the result of the comparative analysis of the 
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Figure 2 


Genome map of Treponema palladium DAL1 
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pallidumDAL-1 


pallidumss14 


pallidumstrChicago 


pallidumstNichols 


pertenuestrCDC2 


Figure 1 
Phylogenetic Tree Depicting the Relationships between T. 
Pallidum subspecies 


subspecies the content of GC% is almost same about 
52.8%. Number of genes in species A, C and E were in the 
range of 1118 to 1122 whereas species B and D had 
comparatively differed to some extent. Number of proteins 
was almost same in number in species A and E respectively; 
and B and D. Species C showed less number of proteins 
counting only till 981 proteins. 


4.2. Tree View Software Analysis 

In addition to the species discrimination, it was interesting to 
explore whether using sequence features to discriminate 
between bacterial subspecies by machine learning will 
provide an accurate phylogenetic relationship between the 
subspecies as documented in Fig 1. 


4.3. Genewiz Browser Results 

4.3.1. Treponema pallidum subsp. pallidum 
DAL-1 

The Lineage: Bacteria - Spirochaetes - Spirochaetales - 
Spirochaetaceae; Treponema - Treponema pallidum - 
Treponema pallidum subsp. pallidum - Treponema pallidum 
subsp. pallidum DAL1. 


Treponema pallidum subsp. pallidum DAL1: This organism 
is the causative agent of endemic and venereal syphilis. 
This sexual transmitted disease was first discovered in 
Europe at the end of the fifteenth century, however, the 
causative agent was not identified until 1905. At one time 
syphilis was the third most commonly _ reported 
communicable disease in the USA. Syphilis is characterized 
by multiple clinical stages and long periods of latent, 
asymptomatic infection. Although effective therapies have 
been available since the introduction of penicillin, syphilis 
remains a global health problem. Treponema 
pallidumsubsp. pallidum str. Dallas1. This strain will be used 
for comparative analysis, Fig.2 shows the Genome map of 
Treponema palladium DAL1. 


e Lane 1 = feature lane (annotations), 
° Lane 2 = nucleotides 
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Genome map of Treponema palladium SS14 
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Lane 3 = intrinsic curvature 
Lane 4 = stacking energy 
Lane 5 = positional preferences 


° Lanes 6 and 7 = Global direct repeats and global 
inverted repeats 

. Lane 8 = GC skew 

° Lane 9 = percent AT 

° Lanes 10, 11, 12 and 13 = A, T, Gand C content 
respectively 

e Lanes 14, 15, 16 and 17= AAAA, TTTT, GGGG 
and CCCC repeats respectively 

. Lane 18 = AT skew 

. Lanes 19 and 20 = direct repeats and simple 
repeats 


Genes in lines are color-coded according to the following 
category: 

° Wine Red = The genes involved in central 
metabolism and respiration without orthologues in 
H.pyloricyan, = methyl-accepting chemotaxis 
proteins (MCPs) 

Dark Blue = Type IV secretion system 

Sky Blue = Genes involved in acid acclimation 
Green = Putative secreted virulence factors 

Pale Green = Glycosyltransferse gene cluster 
specific of H.bizzozeronii; 

Pale Grey = All other CDSs. ACC, acetophenone 
carboxylase; comB, Type IV secretion system; NAP, 
periplasmic nitrate reductase; AHD, allophanate hydrolase; 
GT, glycosyltransferase; NRS, nitrite reductase system; 
SNO, S and N oxidases; FDH, formate reductase system; 
PL, polysaccharide lyase 


4.3.2. Treponema pallidum subsp. pallidum 
$S14 

The Lineage: Bacteria - Spirochaetes - Spirochaetales - 
Spirochaetaceae; Treponema - Treponema pallidum - 
Treponema pallidum subsp. pallidum - Treponema pallidum 
subsp. pallidum SS14. 

Treponema pallidum subsp. pallidum SS14: This organism 
is the causative agent of endemic and venereal syphilis. 
This sexual transmitted disease was first discovered in 
Europe at the end of the fifteenth century; however, the 
causative agent was not identified until 1905. At one time 
syphilis was the third most commonly _ reported 
communicable disease in the USA. Syphilis is characterized 
by multiple clinical stages and long periods of latent, 
asymptomatic infection. Although effective therapies have 
been available since the introduction of penicillin, syphilis 
remains a global health problem. Treponema pallidum 
subsp. pallidum SS14. Treponemapallidum subsp. pallidum 
SS14 was isolated in 1977 from a patient with secondary 
syphilis. This strain is less susceptible than the Nichols 
strain for a number of antibiotics and will be used for 
comparative analysis. Fig.3 shows the Genome map of 
Treponema palladium SS14. 


4.3.3. Treponema pallidum subsp. pallidum str. 


Chicago 
The Lineage: Bacteria - Spirochaetes - Spirochaetales - 
Spirochaetaceae; Treponema - Treponema pallidum - 


Treponema pallidum subsp. pallidum - Treponema pallidum 
subsp. pallidum str. Chicago. 

Treponema pallidum subsp. pallidum str. Chicago: The 
availability of more Treponema pallidumgenomes will 
greatly help comparative studies among isolates; facilitate 
the improvement of typing methods and the identification of 
potential targets to be used as protective antigens. Fig. 4 
shows the genome map of Treponema palladium str. 
Chicago. 


4.3.4. Treponema pallidum subsp. pallidum str. 
Nichols 

The Lineage: Bacteria - Spirochaetes - Spirochaetales - 
Spirochaetaceae; Treponema - Treponema pallidum - 
Treponema pallidum subsp. pallidum - Treponema pallidum 
subsp. pallidum str. Nichols. 
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Figure 4 
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Genome map of Treponema palladium str. Chicago 


the end of the fifteenth century, however, the causative 
agent was not identified until 1905. At one time syphilis was 
the third most commonly reported communicable disease in 
the USA. Syphilis is characterized by multiple clinical stages 
and long periods of latent, asymptomatic infection. Although 
effective therapies have been available since the 
introduction of penicillin, syphilis remains a global health 
problem. Treponema _ pallidum — subsp. pallidum strain 
Nichols, this strain was originally isolated in 1912 from a 
neurosyphilitic patient and is virulent. Fig.5 shows the 
genome map of Treponema palladium str. Nichols 


4.3.5. Treponema pallidum subsp. pertenue str. 
CDC2 

The Lineage: Bacteria - Spirochaetes - Spirochaetales - 
Spirochaetaceae; Treponema - Treponema pallidum - 
Treponema pallidum subsp. pallidum - Treponema pallidum 
subsp. pallidum str. CDC2. 


. i | Treponema pallidum subsp. pertenue: This species causes 
chronic and disfiguring illness called yaws. The disease 
Le starts as a skin infection causing persistent ulcers and 
progresses to form tumor-like masses. This disease tends to 
Se infect children and is common in rural areas in Africa, 
* ‘ Southeast Asia and equatorial South America. Treponema 
pallidum subsp. pertenue str. CDC2, this strain was isolated 
— in Akorabo, Ghana in 1980 and will be used for comparative 
- analysis. Fig. 6 shows the genome map of Treponema 
palladium str. CDC2. 
n= 
7” The collective analysis of the each of the genome 
characterization attained from the Genewiz Browser 
: summarized in the Table 2. This table illustrates the 
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5 75 , browser for each Treponema pallidum subspecies. About 
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|i | sequence alignment was executed by using ClustalW 
-0.194 0.201 software. The table 3 denotes total number of sequences of 
pace*7 five species having 100% similarity based on related type of 
aa —_ proteins. The analysis performed, resulted into 92 
; . sequences of these five species which showed 100% 
‘el tlltl = similarity when matched with each other. It was observed 
0.167 0.304 that species D has the highest number of 43 sequences 
T Content matched with other four species.Species C and D have the 
| maximum 100% score alignment of 13 sequences while 
0.168 0.306 species D and E and have 10 aligned similar 
G Content sequences.Pairing between species A and E; and B and D 
Pee a were found to be having 10 sequences with complete similar 
: : protein based sequences. Species B and E showed the 
PSenD: | least number of 6 sequences aligned score of 100. Table 3 
0.152 0.372 shows the Total number of protein Sequences with aligned 


score of 100% of five Treponema pallidum subspecies. 


4.4.1.2. Protein sequence with 99% Similarity 
The protein sequences were obtained from NCBI genome 


Treponema pallidum subsp. pallidum: This organism is the 
causative agent of endemic and venereal syphilis. This 
sexual transmitted disease was first discovered in Europe at 


browser for each Treponema pallidum subspecies. About 
5061 sequences were compared to each other. Multiple 
sequence alignment was executed by using ClustalW 
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Genome map of Treponema palladium str. Nichols 
software. The above table denotes total number of 
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sequences of five species having 99% similarity based on 
related type of proteins. The analysis resulted into 13 
sequences of these five species which showed 99% 
similarity when matched with each other. It was observed 


Species, 2012, 1(1), 5-14, 
© The Author(s) 2012. Open Access. This article is licensed under a Creative Commons Attribution License 4.0 (CC BY 4.0). 


that Species D and E have the most 99% similar sequences 
about 5 sequences. Species C and E have 4 sequences 
with 99% similar sequences. Table 4 shows the total number 
of protein Sequences with aligned score of 99% of five 
Treponema pallidum subspecies 


4.4.2. Comparative analysis of Protein Based 
on functional categories 

4.4.2.1. Distribution of Proteins (100% Similarity) 
based on Functional categories 

Using ClustalW software, 92 protein sequences were filtered 
based on sequences having 100% similarity. Out of 92 
sequences, 26 types of different proteins were categorized. 
The above table details about the presence of a specific 
type of protein in an single subspecies among 92 sequences 
having 100% similarity.Analysis reveals that among all 
similar proteins, ribosomal proteins L15 and L30 and 
Replication initiator factor proteins were most common to all 
the five subspecies of Treponema pallidum. Aspartyl 
glutamyl / tRNA amidotransferase subunit C and 
hypothetical proteins were the other two types of proteins 
commonly found in all five subspecies of Treponema. 
Special types of putative proteins were found in all five 
species with different functional proteins. Asparty! glutamyl 
/tRNA amidotransferase subunit A proteins and lipoproteins 
were observed in all four species except species E. Protein 
like methionine aminopeptidase was found in species B, C 
and D but not in A and E. Phosphoenol pyruvate 
carboxykinase wa found in species C, D and E except 
Species A and B respectively Table 5 shows the Functional 
categories within the 100% similar protein sequences in 
Treponema pallidum subspecies (‘#' indicates the presence 
of hypothetical proteins with other type of protein according 
to the databases). 


4.4.2.2. Distribution of proteins (99% similar) based 
on Functional categories 

Using ClustalW software, 13 protein sequences were filtered 
based on sequences having 99% similarity. Out of 13 
sequences, 10 types of different proteins were categorized. 
The above table details about the presence of a specific 
type of protein in a single subspecies among 13 sequences 
having 100% similarity. Table 6 shows the Comparative 
analysis of based on proteins present in Treponema 
pallidum subspecies with 99% similarity sequences. Species 
C and D had shown maximum similarity in Apolipoprotein N- 
acyltransferase protein and Alginate O-acetylation protein 
(algl). Species A and B have Spermidine/putrescine ABC 
superfamily ATP binding cassette transporter, ABC protein 
and Species C and E have 30S ribosomal protein S9. Table 
6 shows the comparative analysis of based on proteins 
present in Treponema pallidum subspecies with 99% 
similarity sequences. (# indicates the presence of 
hypothetical proteins with other type of protein according to 
the databases). 


4.5. Protein Functional Site Analysis 

It is apparent, when studying protein sequence families, that 
some regions have been better conserved than others 
during evolution. These conserved regions are generally 
important for the three dimensional structure and function of 
a protein. By analyzing the constant and variable properties 
of such groups of similar sequences, it is possible to derive 
a signature for a protein family or domain, which 
distinguishes its members from all other unrelated proteins. 
A significant analogy is to use the fingerprints for 
identification. A fingerprint, a protein signature can be used 
to assign a new protein to a specific family of proteins and 
thus to formulate hypotheses about its function. 


4.5.1. Comparative Functional Sites in Proteins 
(100% Similar) 

92 proteins scanned with the prosite for predictind the 
functional sites and the locations. Out of these we have got 
28 functional hits. Majorly found are NHL repeat proteins, 
recombinase A protein and Spermidine/putrescine ABC 
superfamily ATP binding cassette transporter, ABC protein 
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Genome map of Treponema palladium str. CDC2 
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respectively. Table 7 shows the Predicted Functional site in 


total of 92 proteins 


Genome Atlas Annotations | # a B Cc D E 
Intrinsic Curvature Start [0.155 | 0.155 [0.156 [0.156 
End | 0.205 | 0.205 | 0.205 | 0.205 
Stacking Energy Start | -8.707 | -8.707 | -8.705 |-8.705 | -8.706 
End | -7.879 | -7.879 | -7.879 |-7.879 |-7.879 
Position Preference Start | 0.134 | 0.134 | 0.134 | 0.134 
End | 0.145 | 0146 | 0.146 | 0.145 
Global Direct Repeats Start 5 5 5 5 | 5 | 
End 75 75 75 75 
Global Inverted Repeats Start 5 5 5 5 5 
End | 75 75 75 | 75 | 75 
GC Skew Start | -0.194 | -0.194 | -0.194 ]-0.194]-0.193 
End | 0.201 | 0.201 | 0.201 | 0.201] 02 
Percent AT Start | 0.2 0.2 02 | 02 
End 08 08 08 08 | 08 | 
& Content Start | 0.167 | 0.167 | 0167 | 0.167 | 0.167 
End | 0.304 | 0.304 | 0.304 | 0.303 | 0.304 
T Content Start | 0.168 | 0.168 | 0.168 | 0.168 
End | 0.305 | 0.306 | 0.306 | 0.305 
G Content Start | 0.163 | 0.163 [0.163 | 0.164 [0.163 
End | 0.369 | 0.369 | 0.368 | 0.368 | 0.368 
C Content Start | 0.152 | 0152 [0152 [0152 [0152 
End | 0.372 | 0.372 | 0.372 | 0.372 [0.372 
AAAA Start | -0.007 | -0.007 | -0.007 | -0.007 | -0.007 
End | 0.056 | 0.056 | 0.056 | 0.056 | 0.056 
TTTT Start | -0.007 | -0.007 | -0.007 | -0.006 | -0.007 
End | 0.055 | 0.055 | 0.055 | 0.055 
GGGG Start | -0.017 | -0.017 | -0.017 [-0.017 |-0.007 
End | 0.046 | 0.046 | 0.046 | 0.046 | 0.045 
cccc Start | -0.016 | -0.016 | -0.016 -0.016 |-0.016 | 
End | 0.044 | 0.044 | 0.044 | 0.044 | 0.044 
AT Skew Start | -0.113 | -0.112 | -0.113 0.112 [-0.113 | 
End | 041 | 041 | 041 | o109 
Direct Repeats Start | 5.555 | 5.562 [5571 [5.582 | 5.561 
End | 6.201 | 6.194 | 6184 [6168 [6.195 
Simple Repeats Start [4.215 | 4.205 | 4.207 | 4.22 [4.215 


End | 4.615 | 4.627 | 4.624 | 4.608 | 4.614 


Table 3 Total number of protein Sequences with 
alianed score of 100% of five Treponema pallidum 


Table 4 Total number of protein Sequences with 
aligned score of 99% of five Treponema pallidum 
subspecies 


4.5.2. Comparative Functional Sites in Proteins 
(99% Similar) 
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Table 5 Functional categories within the 100% similar protein sequences in 
Treponema pallidum subspecies 


Hamme of Proteins 


Replication initiation protein 


DNA directed domain 


Chromosomal domain 


Chromosomal Dnas domain 


DnaJ domain 


50S Ribosomal protein 


Spermidine putrescine import ATP-binding 


ABC Supertamily, cassette transporter, &BC protein 


Pots 


Aspartly glutamyitRNA amidotransterase 


Subunit 4 


Subunit B 


Subunit C 
SecD domain proteim 


Preprotein translocase subunit SecD 


Protein export membrane protein SecD 


Transcription termination factor Rho 


GTP proteins 


GTP-dependent nucleic acid binding protein EngD 


GTP-binding protein Ycohr 


DNA-repai protein RadA 


Lipoprotein 


lipoprotein 


17KDa 


17KDatpp 17 


Copper resistance NIpE 
DHH superfamily protein 


Subtamily 1 


Supertamily phosphoesterase 
GlycyItRNA protein 


Synthetase 


Ligase 
Glutamy!ItRiA protein 


Synthetase 


Ligase 
Putative protein 


Radical SAM domain 


Smr domain 


Septum formation initiator subtamily 


Bat family transcriptional regulator 


Type-S panthothenate kinase 


Esterasedipase 


Ethanoclaminephosphotransterase 


sn .2-diacyliglycerol cholinephotransterase 


Carboxylesterase (est) 
Biosynthesis proteins 


Spore coat polysaccharide biosynthesis protein (sps E) 


Reacetyineuraminate synthatase 


Aminopeptidase 


Methiony! aminopeptidase 


Methionine aminopeptidase 


MATE family multi antimicrobial extrusion protein OR 


Table 6 Comparative analysis of based on proteins present in Treponema pallidum 
subspecies with 99% similarity sequences. 


Sr. No 


Name of Proteins A B c D E 


Spermidine/putrescine ABC superfamily ATP binding cassette transporter, ABC protein] + YF | ft | 


HD domain containing protein P fete ft 7 


Putative Domain protein 


Putative radical SAM domain protein + 

Lysine 2,3-aminomutase | [| [ | f+] 
7 [PPR domaconommaproten Cd) 
5 |DNA repair protein 

Rada, + 

Sms + 
SS A 
A 
i [Rate Cacoutavon pretend 
9 |Nuclease subunit proteins 

ATP-dependent nuclease, subunit 4 + + 
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# 


Exodeonyriboncease Y beta subunit 


fopobetet tay 


indicates the presence of hvpothetical proteins with other tvpe of protein 


13 proteins scanned with the prosite for predicting the 
functional sites and the locations. Out of these we have got 
7 functional hits. Majorly found are ATP-binding cassette, 
ABC transporter-type domain and ATP- dependent 
nuclease, subunit A. Table 8: Predicted Functional site in 
total of 13 proteins. 


5. CONCLUSION 


The comparative analysis of five subspecies of Treponema 
pallidum was performed to study their similarity and 
differences on the basis of comparison between their 
genomic properties and various types of proteins. With 
Bioinformatics tools and software used for analysis we are 
enable to conclude the differentiating characters among the 
subspecies of the Treponema. The genome sequences of 
the five subspecies of T.pallidum were extracted from NCBI 
namely Treponema pallidum DAL1 (species A), Treponema 
pallidum SS14 (species B), Treponem pallidum str. Chicago 
(species C), Treponema pallidum str. Nichols (species D) 
and Treponema pallidum str. CDC2 (species E). Analysis of 
genomes from species A, B, C, D and E were performed on 
the basis of comparison with the genome sequences 
obtained from NCBI about 5166 sequences. The genomic 
properties and various types of proteins with their functional 
site were studied and compared among the five species to 
collect the information on their similarity and differences. 
Multiple sequence alignment was performed using Clustal W 
software of 5166 * 5166 sequences. The sequences aligned 
were filtered with the sequences having 100% and 99 % 
alignment scores based on the similar proteins present in 
the five subspecies. It resulted into 92 sequences with 100% 
identical protein sequences and 13 sequences with 99% 
identical proteins respectively. The functional site were 
obtained using Prositescan tool. 28 functional hits of 100% 
and 7 of 99% identical protein sequences were found to 
have similar functions. The detailed study and research are 
in mentioned tables for better interpretations of results and 
discussion. According to Table 1, we can infer that species 
B and D_ have similar number of genes and proteins 
whereas species A, C and E show similarity enomic 
properties. From Table 2; it can be observed that all DNA 
properties are similar in all five species. Species B and E 
show similar gradation in Direct repeats wherein species A 
and E and species B and C have identical Inverted repeats. 
Multiple sequence alignment using ClustalW software, 
screened all the five subspecies sequences and hence 
resulted that species C and D have 13 identical sequences 
of functional proteins with 100% alignment score and 
species D and E have 5 identical sequences of 99% 
similarity. The details are mentioned in Table 3 to 6 
respectively. Table 7 and 8 gives information about the 
functional site of 28 proteins (100% identical) and 7 
proteins (99% identical). The most analogous proteins are 
NHL repeat proteins, recombinase A protein and 
Spermidine/putrescine ABC superfamily ATP binding 
cassette transporter, ABC protein with 100% identical 
sequences whereas’ ATP-binding cassette, ABC 
transporter-type domain and ATP- dependent nuclease, 
subunit A with 99% identical sequences. Comparative 
genomics analysis between the species revealed that 
species B and D and species A and E are closely related 
to each other in their genomic composition while species 
C, D and E are similar in functional protein content. By 
means of local sequence similarity searches, Protein 
profile searches, and analysis of 100% and 99% similar 
protein funcational categories | we have conducted a 
detailed comparative anal-ysis of the genomes of the 
T.pallidum. The level of conservation between functional 
classes and evolutionary measure, it was possible to 
characterize, in functional terms, the nature of the 
divergence between the five spirochetes and the common 
and distinct aspects of their physiological strategies. 
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Table 7 Predicted Functional site in total of 92 proteins 


Sequence |D —_| Position Protein Name Functional Site 
Species A 


csi YP, 2 
csi YP. 
csi YP. OS 
csi YP. OZ 
csi YP, OS 


edsid_P_0052242791 |170t0 204 Aspertylamy/4RNA anidoransferase subunit A 
Species B 
odsid_YP_001938415:1 | ‘tot? [Linopatein 
cdsid_YP_001939680.1 | 43t0 212 [Recombinase A 
21 289 Recombinase A 
edsid_YP_0019398491 | 22t0 94 [Replication inition patein Dna donein 
Species C 
celsic_Y’P_005631034.1 |262 to 270|Phosphoenolpyruvate carhoxykinase 
celsic_Y’P_005631313.1 |317 to 326/DHH superfamily protein, subfamily 4 
celsic_Y’P_005631530.1 | 83 to 105 |sn-1 2-diacylglycerol chalinephosphotransferase 
odsid_YP_00S631531.1 | 14t0 341 [olyoyLRNA synthetase 
esi YP 00565384 
esi YPO0S68724 
cdsid_YP 0086318751 | 68to 86 [30S Ribosomal patin $9 
odsid_YP 0086318764 |10 to 127|S05 Ribosomal paten 13 
Species D 
cdsid_NP_218696.1 341055 {503 Ribosomal protein L31 P501143, RIBOSOMAL _L31 Ribosomal protein L31 signature 
cdlsid NP_219001.1 [312 to 374|N-acetyIneuraminate synthase 
Species E 
edsid_YP_006230208.1 | 4010 73 [NAL repeat protein 
401073 [AL repeat rte 
14940 179)NAL epee ec 


Hl 
testo 223\Lrepestwotin PSS 25, NL NAL Tepe ri 
zarto one repestctin SST 25, NL NAL Tepe ri 
zreto6|NML epee ctin SST 25, NL NAL Tepe ri 
ci VP. 0525064 


Table 8 Predicted Functional site in total of 13 proteins 


Sequence ID Position Protein Hames Functional Site 
Species & 
cdsid_YP_005223894.1} Sto 370 |Spermidine/putrescine import ATP-binding protein pots family |PS51305, POTA Spermidine/putrescine import ATP-binding protein pot4 family profile 
6to 236 {ATP-binding cassette, ABC transporter-type domain PS50893, ABC_TRANSPORTER_2 ATP-binding cassette, ABC transporter-type domain profile 


136 to 150 [ATP-binding cassette, ABC transporter-type domain PS00211, ABC_TRANSPORTER_1 ABC transporters family signature 
Species B 
rl [rit 


Species C 
cdsid_YP_005631875.1] 68to 86 [30S Ribosomal protein SO PSO0360, RIBOSOMAL_S9 Ribosomal protein S9 signature 
Species D 
cdsi_ NP_218693.1 
cdsid_NP_2193331 
566 to 850 | ATP-dependent nuclease, subunit 4 PS51217, UVRD_HELICASE_CTER UvrD-like DNA helicase C-terminal domain profile 
Species E 
rl a a 
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Protein functional profile searches resulted in the 2. This study enables the further analysis of the species 
identification of diverged and common components might to understand and grasp the growth, development 
mediate interactions between the spirochetes and host cells and impact of research and to research on the 
or the extracellular matrix. It appears possible to tentatively pathogenecity activity to overcome the diseases and 
understand the divergent mechanisms underlying their for treatment and prevention of the causative agent of 
invertebrate pathogenesis and virulence and adaptation to diseases caused by these organisms. 


their specific niches. 
Comparative analysis of Spirocheate Treponema pallidum Finally, our strategy has demonstrated the discriminatory 
five subspecies was performed. power of computational tools and techniques with Support 
Vector Machine classification as the sequence based 
1. The parameters include comparison of the genomic comparative analysis to discriminate proteins and their 
and protein function similarity and differences on the functions associated within the species of pathogenic 
basis of genome sequences obtained from NCBI were _— microorganisms with high reliability and accuracy. 
studied. 


FUTURE ISSUES 


Sequence variants could be readily used for molecular typing and identification of these Treponema pallidum strains and, with accumulation of additional 
data from other Treponema pallidum genomes, for epidemiologic applications and clinical discrimination between reinfection or reactivation of diseases. 
Moreover, the ability to now sequence numerous T.pallidum strains, especially those showing different degrees of virulence, will allow phenotype to be 
correlated with sequence. This is a significant development for an organism of important public health impact, but for which standard bacterial genetic 
methods is untenable. We hope that this work can be extended by exploring further sequence properties as well as more diverse organisms, to elucidate 
the underlying host association and evolutionary mechanisms 
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